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Abstract 

This work addresses the problem of measuring how many languages a person 
“effectively” speaks given that some of the languages are close to each other. In 
other words, to assign a meaningful number to her language portfolio. 

Intuition says that someone who speaks fluently Spanish and Portuguese is 
linguistically less proficient compared to someone who speaks fluently Spanish 
and Chinese since it takes more effort for a native Spanish speaker to learn 
Chinese than Portuguese. As the number of languages grows and their profi¬ 
ciency levels vary, it gets even more complicated to assign a score to a language 
portfolio. 

In this article we propose such a measure (’’linguistic quotient” - LQ) that 
can account for these effects. 

We define the properties that such a measure should have. They are based 
on the idea of coherent risk measures from the mathematical finance. 

Having laid down the foundation, we propose one such a measure together 
with the algorithm that works on languages classification tree as input. 

The algorithm together with the input is available online at lingvometer.com 
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1 Introduction 

If we aim to compare different language portfolios (consisting of different lan¬ 
guages at different proficiency levels), we need some way to measure it. Similar 
to IQ that is supposed to measure the intelligence, we need a “LQ” that measures 
linguistic intelligence. 

Linguistic intelligence for the scope of this article is the achieved proficiency 
in some set of languages, i.e. not the potential to learn fast a new one, but an 
achievement that already took place. 

The main idea is that languages that are related to each other give less to 
the linguistic intelligence than those that are unrelated. 

We will deploy actively the ideas and methods from finance. A sample of 
languages resembles to some extent a portfolio of assets. 

To a given portfolio of assets it is more valuable to add an asset that is not 
correlated or even negatively correlated to the assets of a given portfolio (e.g. 
[Markowitz 52]). 

In the same way for a given sample of languages (further, portfolio of lan¬ 
guages) it is more valuable (in the sense of linguistic intelligence) to add a 
language that is not related to the languages of a given portfolio. 

The article is organized as follows. In the section 3.1 we design the measure 
given the set of properties that sounds reasonable from the intuitive and logical 
point of view. 

In the section 3.2 we turn this design into mathematical model that allows 
to calculate a score to a given portfolio of languages (it could be calculated for 
any language profile at lingvometer.com). 

2 Properties of linguistic intelligence 

We reason on properties the linguistic intelligence should have. Based on these 
principles we will later derive the formal rules. 
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We consider all the languages as equal to each other. It is not a trivial 
assumption since there are languages that have been developing over thousands 
of years a reach literary and oral traditions, serving as medium for the scientific 
expression etc. 

There are other languages that fail to compete with the former on this ac¬ 
count. It is tempting to say that they are less valuable. However, we still regard 
them as equal since the ’’undeveloped” languages could be harder to acquire ex¬ 
actly because they don’t have the written tradition. 

People master the languages to the different extent. Assume that the profi¬ 
ciency degree of the grown-up educated native speaker is 1. Someone who never 
faced it has the level 0 in this language. It would be logical to demand that 
increasing the proficiency in some language of the portfolio would increase the 
LQ of the person. 

We set the measure of the portfolio of n independent languages to be n, i.e. 
LQ of the portfolio of Spanish and Chinese would be 2. 

Consequently, measure of the portfolio of n languages, such that some of 
them are related to each other is less than n. Further, if we add to a portfolio 
a language that is already there then the measure of the portfolio must remain 
the same. 

Thus, we would assign to the portfolio of two languages a number between 
1 and 2. The closer they are, the closer is this number to 1. The further they 
are, the closer is this number to 2. 

We interpret such a measure as the “real” (or effective) number of languages 
a person speaks. 

Thus, one and one is two if we add Spanish to Chinese. If we add Spanish 
and Portuguese, then one and one could be something like 1.3, maybe a little 
bit less or a little bit more. 

There are also other reasons than the time to learn a new language to assign 
a higher score to Spanish+Chinese than to Spanish+Portuguese. We could also 
argue that learning a distant language, one learns also new structures that are 
not there in the related language. 
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3 Formal approach 

In this chapter we will argue on reasonable properties of a linguistic intelligence 
measure. Further, we formalize the properties as axioms and propose a formula 
that satisfies them. 

3.1 First principles 

We write our considerations on how the LQ measure should behave in the fol¬ 
lowing list of axioms. It was inspired by the idea of coherent risk measures 
[Artzner et al. 99] from financial mathematics. 

Consider a language I is an element of languages space L and weighted 
language w (in other words a language with proficiency level) is an element of 
the space W:=L x [0,1] that is a language and a number between 0 and 1 (1 is 
a fluent command of a language, 0 means any knowledge is absent). 

Portfolio n of N languages is an element of space . If needed we could 
also consider the portfolio 11 of N languages to be an element of space (i.e. 
languages at fluent proficiency). 

Definition: A linguistic intelligence measure (l.i.m.) is a function s.t. A : 

R 

If we set all proficiency levels to 1, then l.i.m. is also defined on ■. A : 

R. 

Now, we formalize the arguments of the previous chapter as axioms. 

We consider all languages to be equal: 

Axiom E. Equivalence. V? € 7 ^ 0 : X{1) = 1 

A portfolio of languages weights at most as sum of its components: 

Axiom S. Subadditivity. For any 2 language portfolios it must hold: 

VHi, Ha e : A(ni U U 2 ) < A(ni) -k A(n2) 

Axiom ND. No double-counting. For a language I that is in portfolio 11 it 
must hold: 

/ e A(nu0 = A(n) 

Axiom I. Independency. For a language I that is independent to any lan¬ 
guage in portfolio 11 it must hold: 

n_L I ^ A(nu0 = A(n) -k A(0 

Axiom PH. Positive homogeneity. Proficiency effect is linear. 

Vc e [0,1], v; e L, w={c,l)€W : X{w) = cA(l) 

Definition A linguistic measure is called coherent (c.l.i.m.) if it satisfies 
axioms E, S, D, I, PH. 

There are many measures that satisfy these axioms. We propose one in the 
next chapter. 

Even though a c.l.i.m. seems to be strictly defined since there are many 
requirement to be met, there are 2 key points in the whole construction that 
still leave the room for interpretation: how the language space is constructed and 
how the set-theoretic operations (independence and union) between elements of 
this space are defined. Later on, we will see 2 completely different construction 
approaches and further constructions are still possible. 

Further, we observe the axioms from different points of view to understand 
them better. If the axiom E takes place, then axioms S, D and I lead to the 
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following inequality: 

vn GL^yiGL: A(n) < A(n u 0 < A(n) +1 

Thus, adding a language to a portfolio adds a number between 0 and 1 to 
linguistic intelligence. 

This inequality helps us to reduce the range of l.i.m., s.t. A : —>■ [l,fV] 

or in terms of W (with PH axiom) A : —>■ [0, N], 

Simplifying further the inequality, we can write it for a portfolio of 2 lan¬ 
guages as follows: 


V/i, I 2 & L,^ = {hU I 2 ) & L'^ ■■ I < A($) < 2 

If for some reason a space L is defined in such a way that it includes an 
empty element, then the axiom E must be adjusted such that A(0) = 0 

3.2 LQ 

We consider the portfolios of languages to be a tree with hypothetical Tower of 
Babel (ToB) language as a source. A portfolio consisting of only one language 
would be a path from the source to the language through the language families, 
groups, subgroups etc classification. 

The children of ToB are language families like Indo-European or Sino- 
Tibetan. This is the layer of nodes of rank 1, the languages are considered 
independent if they belong to different language families. 

Thus, the language space is the full tree of all (N) languages. The 
languages are the leaves of the tree and portfolios are the induced subgraphs 
of the full graphs containing the source (ToB), leaves and the paths between 
them. Thus, a portfolio of 1 language would be a path from the source through 
all families, sub-families, groups etc to this language. The union of 2 portfolios 
would be the union of the corresponding subgraphs. 

To illustrate it, consider a portfolio H consisting of Chinese and Serbian 
and portfolio $ consisting of English and Slovene. Then the unified portfolio 
T = n U $ would contain all 4 languages. The situation is illustrated on the 
figure 1. 

Let N be the maximal depth of the tree and Vr the set of all nodes of depth 
r. Initialize all languages with their proficiencies (not necessarily all hanging in 
the deepest layer). 

Starting from the deepest layer up to the source calculate for each node in 
the layer iteratively the LQ (bottom-up). 

Denote the LQ of the node v as Xy and Ch(v) as the set of nodes which are 
the children of node v. For r = N,0, Vv G Vr 


A.= ( S 

cGCh{v) 

We define the result of the last step of this iterative process as LQ mea¬ 
sure. The pseudocode for the algorithm both the straightforward way and the 
recursive way is given in the appendix. 

Definition. LQ-measure is such a l.i.m. that LQ = A„,u G Vq 
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Figure 1: Union of portfolios 


There’s only one node at the layer of depth 0 that is the root of the tree 
(Tower of Babel language). A numerical example is given in Appendix. 

In the formula, we take the LQs of the nodes to the power of the root of 
the rank and not the root itself as on the real data it produces more intuitive 
results. Actually, taking instead of the root of the rank any monotone function 
of the rank with the fixed point at 1 (i.e. /(I) = 1) would give another c.l.i.m.: 


cGCh{v) 

LQ turns out to be coherent. 

Lemma. LQ is c.l.i.m. 

Proof. Axiom S. Consider the language portfolios $ and 11 that could have 
common elements (they have at least ToB in common). We could rewrite the 
children of the node as being the union of its children belonging to 11 and its 
children belonging to $, then we algorithm to calculate LQ{Ii U <&) looks as 
follows (again starting with the nodes from the bottom): 

= ( E 

c^ChP(v)\jCh‘^(v) 

The algorithms to calculate LQiJl) and LQ{^) are respectively 

c^Ch^{v) 

and 

c&Ch‘i>(v) 
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The numbers A(c) forms a vector from the space their Minkowski 

distance to the 0 vector with p = f{r) is At,. In case Ch'^{v) n Ch}^{v) = 0 the 
triangular inequality states that 

A. = ( E E + ( E 

c^Ch^(v)\jCh^(v) c^Ch^(v) cGCh^{v) 

If the intersection of the children for this node is not empty, then we have ad¬ 
ditional terms on the right side of the inequality and, thus, still holds. Note that 
on the right side of inequality we have the components that flow in calculation 
of LQ(n) and LQ{^) respectively. 

Thus, going from the bottom to the top, we will Anally have 

LQiU U $) < LQ{U) + LQ{<^>) 

This satisfies the axiom S. 

Axiom I is satisfied due to the property of the formula that at the rank 1 
(this level is considered to contain independent entities) the sum of LQs is linear 
(i.e. at no cost). 

Axioms E and PH are satisfied due to the fact that initialized with 1 (or 
with c € [0,1] in case PH) the LQ of the only language could be push up alone 
the path to the source at no cost. 

Axiom ND is satisfied due to the space construction. ■ 

A shortcoming of the approach is that some language dependences are not 
captured by tree structure. For instance, the direct French influence on English 
could be represented by the edge between them. This would, however, destroy 
the tree structure. 

One of the advantages is that the language tree data is easily available, for 
instance in [ethnologue 2015]. 

One could try the algorithm on different input at lingvometer.com 

3.3 Matrix approach 

We have seen in the previous chapter the measure constructed on language 
trees. However, it does not have to be trees. In this chapter we will discuss an 
alternative approach, namely the one based on correlation matrices. 

The entries of the matrices represent a correlation or language distance (more 
precisely 1 minus distance to make it look like a correlation) expressed as a 
number between 0 and 1. There are many works on measuring such a distance, to 
name a few [Petroni and Serva 2010], ]Delsing and Akesson 2005], ]Koehn 2005], 
]Chiswick and Miller 2005], ]Gingsburgh and Weber 2011]. They use different 
approaches: difficulty of acquiring a language, difficulty of machine translation 
between languages, number of words in common etc. Most of them cover rather 
a small part of all possible correlations/distances. With N languages, there must 
be N(N-l)/2 distances. 

The same as stocks prices are correlated with the information stored in 
correlation matrices, we could also consider the languages to be correlated. 

We try to construct a l.i.m. on 2 x 2 matrices i.e. portfolios of 2 languages. 
The correlation matrix in this case looks like this: 

1 P 
P 1 
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Denote it as M(p). 

Languages are independent if p = 0 and more dependent as p is closer to 1 
with two languages being equal if p = 1. 

In order for the l.i.m. to be coherent, we need among other things A(M(0)) = 
2 and A(M(1)) = 1 with A being monotone decreasing on p. 

One such a c.l.i.m. is X{M{p)) = 2 — p. Axioms I and ND we checked above. 
The matrix of one language is simply 1, thus, axiom E holds as well. Axiom PH 
is not relevant on space (it’s relevant only on spaces). Axiom S holds 
since 2 — p<l + l (0 <p<l). 

It is not the only c.l.i.m. on this space since a familty of c.l.i.m. could be 
constructed like this \{M{p)) = 2 — p^,r > 0 

We discussed a particular case that shows an example of c.l.i.m. on the 
matrix space. The general case is still open. 

One of the shortcoming of the matrix approach is that the data on all N x 
{N — 1) dependencies is not available. Different studies assign if close, but still 
different numbers. Apparently, the tree data could be mapped to the matrix 
and vice versa (e.g. [Petroni and Serva 2008]). 


4 Conclusion 

We presented a sound way to measure a portfolio of languages. How could it be 
used except for measuring someone’s linguistic intelligence? 

There is a broad field of research on intersection between economics and 
linguistics. An extensive overview could be found in [Grin 2003]. 

LQ could be used for instance as communication cost function. An institut 
that evaluates several options as a working language (s), could aim to minimize 
the overall LQ of its members since it would also mean the minimization of 
communication costs. For example, the optimal language (or an optimal bun¬ 
dle of languages) for European Union could be chosen in such a way that the 
aggregated LQ of European population would be minimal. 

The implementation of the LQ algorithm could be found at lingvometer.com 
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6 Appendix 

6.1 Pseudocode for LQ algorithm 

The pseudocode is presented in python style. First, initialize all leaves with 
LQ equal to the proficiency level of the corresponding language. Note that the 
leaves could lie in different layers and, thus, have different rank: 

for node in nodes: 

if node is leaf: 

node.lq=node.language.proficiency 
The calculation: 

for rank in range[deepest_rank,0]: # i.e. start with deepest_rank, end with 0 
for node in nodes_of_layer(rank): 

for child in children(node): 

node.lq+=sqrt(child.Iq) 
node.lq=power(node.Iq, 1/sqrt(rank)) 


The result is 

LQ=nodes_of_layer(0)[0].Iq 

We also write down the recursive version of the algorithm. The initializa¬ 
tion step is the same. The recursive function that will do the job could be 
implemented like this 

def lq_recursive(rank): 

if children(node) is empty: 

return node.language.proficiency 


else: 

temp_sum=0 

for child in children(node) : 

temp_sum+=lq_recursive(child) 
return power(temp_sum, node.rank+1) 

Then we can calculate LQ with the following call: 

LQ=lq_recursive(ToB) 

6.2 Numerical Example to LQ-Tree 

We introduce here an example of a language profile that is not trivial, but also 
not very complex. It contain patterns that test the first principles. Someone 
speaks fluently Serbian, Slovene, Croatian and Chinese fluently. Besides, he 
has some command of English that qualifies at 50% level. We also do the 
arithmetics and show every step of the calculation. Consider the following 
language portfolio. To initialize the algorithm, we assign the rank to each layer 
and set LQ to 1 for each language except for English where we set LQ to 0.5 
(Fig. 1). 
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Figure 2: Initialization 


The deepest layer (that of Serbian, Slovene and Croatian) is of rank 5. Now, 
calculate the LQ for Western Branch of South-Slavic languages (Fig. 2). 

Xwestern = (l'^ + l'/5 + I'/S) = 3^1 « 1.63 

If a node has just one child than according to the formula it takes its LQ 
unbiased. 

For another example, let’s take the node of the Indo-European family (rank 
of the layer is 2). It has 2 children, namely Germanic and Slavic groups with 
LQ calculated at previous iterations equal to 0.5 and 1.63 respectively. 

^Western = (0.5^ + « 1.84 

At the final step we can sum the LQs of the languages families at no cost: 
LQ = 1.84-k 1 = 2.84 

The whole tree with LQs for each node is represented on the Fig.3 
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